Transparent Checkpoint-Restart of Multiple Processes on Commodity Operating Systems
نویسندگان
چکیده
The ability to checkpoint a running application and restart it later can provide many useful benefits including fault recovery, advanced resources sharing, dynamic load balancing and improved service availability. However, applications often involve multiple processes which have dependencies through the operating system. We present a transparent mechanism for commodity operating systems that can checkpoint multiple processes in a consistent state so that they can be restarted correctly at a later time. We introduce an efficient algorithm for recording process relationships and correctly saving and restoring shared state in a manner that leverages existing operating system kernel functionality. We have implemented our system as a loadable kernel module and user-space utilities in Linux. We demonstrate its ability on real-world applications to provide transparent checkpoint-restart functionality without modifying, recompiling, or relinking applications, libraries, or the operating system kernel. Our results show checkpoint and restart times 3 to 55 times faster than OpenVZ and 5 to 1100 times faster than Xen.
منابع مشابه
CRAK: Linux Checkpoint/Restart As a Kernel Module
Process checkpoint/restart is a very useful technology for process migration, load balancing, crash recovery, rollback transaction, job controlling and many other purposes. Although process migration has not yet been widely used and is not widely available commercial systems, the growing shift of computing facilities from supercomputers to networked workstations and distributed systems is incre...
متن کاملLinux-CR: Transparent Application Checkpoint-Restart in Linux
Application checkpoint-restart is the ability to save the state of a running application so that it can later resume its execution from the time of the checkpoint. Application checkpoint-restart provides many useful benefits including fault recovery, advanced resources sharing, dynamic load balancing and improved service availability. For several years the Linux kernel has been gaining the nece...
متن کاملRecovery Techniques to Improve File System
RECOVERY TECHNIQUES TO IMPROVE FILE SYSTEM RELIABILITY Swaminathan Sundararaman We implement selective restart and resource reservation for commodity file systems to improve their reliability. Selective restart allows file systems to quickly recover from failures; resource reservation enables file systems to avoid certain failures altogether. Together they enable a new class of more robust and ...
متن کاملThe Design and Implementation of Berkeley Lab’s Linux Checkpoint/Restart
Clusters of commodity computers running Linux are becoming an increasingly popular platform for highperformance computing, as they provide the best price/performance ratio in the marketplace. But while the size and raw power of Linux clusters continues to increase, many aspects of their software environments continue to lag behind those provided by proprietary supercomputing systems. One featur...
متن کاملThe KeyKOS Nanokernel Architecture
The KeyKOS nanokernel is a capability-based object-oriented operating system that has been in production use since 1983. Its original implementation was motivated by the need to provide security, reliability, and 24-hour availability for applications on the Tymnet® hosts. Requirements included the ability to run multiple instantiations of several operating systems on a single hardware system. K...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2007